mean opinion score
- North America > United States (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis
Kalahroodi, Mohammad Javad Ranjbar, Faili, Heshaam, Shakery, Azadeh
Existing Persian speech datasets are typically smaller than their English counterparts, which creates a key limitation for developing Persian speech technologies. We address this gap by introducing ParsVoice, the largest Persian speech corpus designed specifically for text-to-speech(TTS) applications. We created an automated pipeline that transforms raw audiobook content into TTS-ready data, incorporating components such as a BERT-based sentence completion detector, a binary search boundary optimization method for precise audio-text alignment, and audio-text quality assessment frameworks tailored to Persian. The pipeline processes 2,000 audiobooks, yielding 3,526 hours of clean speech, which was further filtered into a 1,804-hour high-quality subset suitable for TTS, featuring more than 470 speakers. To validate the dataset, we fine-tuned XTTS for Persian, achieving a naturalness Mean Opinion Score (MOS) of 3.6/5 and a Speaker Similarity Mean Opinion Score (SMOS) of 4.0/5 demonstrating ParsVoice's effectiveness for training multi-speaker TTS systems. ParsVoice is the largest high-quality Persian speech dataset, offering speaker diversity and audio quality comparable to major English corpora. The complete dataset has been made publicly available to accelerate the development of Persian speech technologies. The ParsVoice dataset is publicly available at: https://huggingface.co/datasets/MohammadJRanjbar/ParsVoice.
ArFake: A Multi-Dialect Benchmark and Baselines for Arabic Spoof-Speech Detection
Maged, Mohamed, Ehab, Alhassan, Mekky, Ali, Hassan, Besher, Shehata, Shady
With the rise of generative text-to-speech models, distinguishing between real and synthetic speech has become challenging, especially for Arabic that have received limited research attention. Most spoof detection efforts have focused on English, leaving a significant gap for Arabic and its many dialects. In this work, we introduce the first multi-dialect Arabic spoofed speech dataset. To evaluate the difficulty of the synthesized audio from each model and determine which produces the most challenging samples, we aimed to guide the construction of our final dataset either by merging audios from multiple models or by selecting the best-performing model, we conducted an evaluation pipeline that included training classifiers using two approaches: modern embedding-based methods combined with classifier heads; classical machine learning algorithms applied to MFCC features; and the RawNet2 architecture. The pipeline further incorporated the calculation of Mean Opinion Score based on human ratings, as well as processing both original and synthesized datasets through an Automatic Speech Recognition model to measure the Word Error Rate. Our results demonstrate that FishSpeech outperforms other TTS models in Arabic voice cloning on the Casablanca corpus, producing more realistic and challenging synthetic speech samples. However, relying on a single TTS for dataset creation may limit generalizability.
- Africa > Middle East > Morocco > Casablanca-Settat Region > Casablanca (0.26)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Africa > Middle East > Egypt (0.04)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.05)
- Asia > China > Hong Kong (0.05)